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(54) Methods and apparatus for performing hash operations in a cryptography accelerator 



(57) Methods and apparatus are provided for imple- 
menting a cryptography accelerator for performing op- 
erations such as hash operations. The cryptography ac- 
celerator recognizes characteristics associated with in- 
put data and retrieves an instruction set for processing 
the input data. The instruction set is used to configure 
or control components such as MD5 and SHA-1 hash 
ceres, XOR components, memory, etc. By providing a 
cryptography accelerator with access to multiple in- 
struction sets, a variety of hash operations can be per- 
formed in a configurable cryptographic accelerator. 
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Description 

Cross Reference to Related Applications 

[0001] This application claims priority under U.S.C. 
119(e) from U.S. Provisional Application No. 
60/368,583, entitled "Methods And Apparatus For Im- 
plementing A Configurable Authentication Accelerator," 
as of filing on March 28, 2002, the disclosure of which 
is herein incorporated by reference for all purposes. 

Background of the Invention 

1 . Field of the Invention. 

[0002] The present application relates to Implement- 
ing a cryptography accelerator. More specifically, the 
present application relates to methods and apparatus 
for providing a configurable cryptography accelerator 
with instruction sets for performing hash operations on 
input data. 

2. Description, of Related Art 

[0003] Conventional software and hardware designs 
for performing hash operations are inefficient. One tech- 
nique for securing a communication channel between 
two network entities such as a client and a server spec- 
ifies that the two entities perform a cryptography hand- 
shake sequence. During the cryptographic handshake 
sequence, the two network entitles will typically perform 
various cryptographic operations such as encryption 
and authentication operations to verffy the identity of the 
other and to exchange information to establish a secure 
channel. 

[0004] In one example, session keys are exchanged 
after the identity of the other network entity is verified. 
However, both software, firmware and hardware tech- 
niques for performing hash operations, such as hash op- 
erations used In cryptography handshake sequences, 
have bean Inefficient and resource intensive. Cryptog- 
raphy handshake sequences and hash algorithms are 
described in Applied Cryptography, Bruce Schneier, 
John Wiley & Sons, Inc. (ISBN 0471128457), incorpo- 
rated by reference in Its entirety for all purposes. 
[0005] It is therefore desirable to provide methods and 
apparatus for improving hash operations with respect to 
some or all of the performance limitations noted above. 

Summary of the Invention 

[0006] Methods and apparatus are provided for imple- 
menting a cryptography accelerator for perf orming op- 
erations such as hash operations. The cryptography ac- 
celerator recognizes characteristics associated with in- 
put data and retrieves an Instruction set for processing 
the input data. The instruction set Is used to configure 
or control components such as MD5 and SHA-1 hash 
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cores, XOR components, memory, etc. By providing a 
cryptography accelerator with access to multiple in- 
struction sets, a variety of hash operations can be per- 
formed In a configurable cryptographic accelerator. 
5 [0007] According to various embodiments, a cryptog- 
raphy accelerator for perf orming hash operations is pro- 
vided. The accelerator Includes a first hash core, a per- 
sistent memory, and a temporary memory. The first hash 
core is operable to perform a plurality of rounds of hash 
10 computations on input data to derive processed data. A 
persistent memory contains a plurality of instruction 
sets. The plurality of Instruction sets provide information 
for the first hash core on operations to perform on input 
data and intermediate data during the plurality of rounds 
of hash computations. The temporary memory is cou- 
pled to the first. hash core. The temporary memory Is 
operable to hold input data and Intermediate data. 
[0008] According to other embodiments, a method for 
performing hash operations is provided, input data is re- 
so ceived. Characteristics associated with the Input data 
are determined. An instruction set for performing hash 
operations on input data is selected. The instruction set 
is selected from a plurality of instruction sets maintained 
In persistent memory associated with a first hash core. 
2S The first hash core is configured using the instruction 
set. The first hash core is operable to perform hash op- 
erations on Input data based on the instruction set. 
[0009] These and other features and advantages of 
the present invention will be presented in more detail in 
30 the following specification of the invention and the ac- 
companying figures, which illustrate. by way of example 
the principles of the invention. 

-Brief Description of the Drawings 

35 

[0010] The invention may best be understood by ref- 
erence to the following description taken in conjunction 
with the accompanying drawings, which are illustrative 
of specific embodiments of the present invention. 

40 

Figure 1 Is a diagrammatic representation of a sys- 
tem that can use the techniques of the present in- 
vention. 

Figure 2A Is a diagrammatic representation of an 
Integrated drcult containing a processing core for 
performing hash operations. 
Figure 2B is a diagrammatic representation show- 
ing a structure referencing instruction sequences. 
Figure 2C is a diagrammatic representation of 
so mechanisms lor determining resource dependen- 
cies. 

Figure 3 Is an Interaction diagram showing a se- 
quence in which the techniques of the present in- 
vention can be applied. 
55 Figure 4 is a diagrammatic representation showing 
input data, intermediate data, and processed data. 
Figure 5 is a flow process diagram showing tech- 
niques for selecting an instruction set. 
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Figure 6 is a flow process diagram showing TLS 1 .0 
key derivation. 

Figure 7 is a flow process diagram showing SSLv3 
key derivation. 

Figure 8 Is a flow process diagram showing TLS 1 .0 
finished message generation. 
Figure 9 is a flow process diagram showing SSLv3 
finished message generation. 

Detailed Description of Specific Embodiments 

[0011] The present application relates to implement- 
ing a cryptography accelerator. More specifically, the 
present application relates to methods and .apparatus 
for providing a cryptography accelerator capable of per- 
forming a variety of different hash operations on input 
data. 

[0012] Reference will now be made In detail to some 
specific embodiments of the invention including the best 
modes contemplated by the inventors for carrying out 
the invention. Examples of these specific embodiments 
are illustrated in the accompanying drawings. While the 
invention is described in conjunction with these specific 
embodiments » it will be understood that it is not intended 
to limit the invention to the described embodiments. On 
the contrary, It is intended to cover alternatives, modifi- 
cations, and equivalents as may be included within the 
spirit and scope of the invention as defined by the ap- 
pended claims. 

[001 3] For example, the tech niques of the present in- 
vention will be described in the context of the SHA-1 and 
MD5 hash algorithms. However, it should be noted that 
the techniques of the present Invention can be applied 
to a variety of different hash operations for cryptography 
processing m general. In the following description, nu- 
merous specific details are set forth in order to provide 
a thorough understanding of the present invention. The 
present invention may be practiced without some or aS 
of these specific details. In other instances^well known 
process operations have not been described in detail in 
order not to unnecessarily obscure the present inven- 
tion. 

[0014] A wide variety of algorithsm are used for en- 
cryption and authentication operations, in many con- 
ventional Implementations, software Is used to Identify 
the type of data and the cryptographic processing need- 
ed for the particular data sequence. However, crypto- 
graphic operations implemented entirely in software on 
a generic processor such as a reduced instruction set 
(RISC>or complex instruction set (CISC) processors are 
highly inefficient. In many environments., it is beneficial 
to use speciaBzed accelerators for performing crypto- 
graphic operations, such as DES and SHA-1 opera- 
tions. In typical cryptography accelerator Implementa- 
tions, a cryptography accelerator i3 configured to per- 
form resource intensive cryptographic operations while 
software through an external host is configured to per- 
form sequencing. That is, software formats and se- 



quences data and makes function calls to elementary 
cryptographic operators. In one example, a cryptogra- . 
phy accelerator would be responsible for executing a 
function such ascryptooperation(data, key I. key2) while 

5 the software would be responsible for formatting the da- 
ta properly, acquiring the keys, and making multiple calls 
to the function when necessary. 
[0015] More recent efforts have focused on imple- 
menting both core processing as well as formatting and 

10 sequencing on a cryptography accelerator. In one ex- 
ample, software running on a host such as a CPU ex- 
ternal to a cryptography accelerator could simply for- 
ward a packet to the cryptography accelerator. Using the 
packet, the cryptography accelerator would extract in- 

*s formation to determine what type of processing and how 
many rounds of processing need to be performed. 
[0016] One technique for implementing such a cryp- 
tography accelerator that performs both cryptography 
processing and sequencing uses state tables. Each 

20 load or store instruction on the cryptography accelerator 
is represented by one or more states. However, be- 
cause many variations in cryptographic algorithms exist, 
a large number of states exist. Having a significant 
number of states makes implementation and verification 

25 extremery difficult. Furthermore, if a new cryptographic 
algorithm is developed, substantial work would have to 
be performed in order to update the states associated 
with the instructions. 

[0017] Consequently, the techniques of the present 

30 invention provide seq uences of instructions for perform- 
ing cryptographic as well as sequencing operations on 
data: Instruction sequences can relatively easily be im- 
plemented for particular cryptographic operations. 
When a new algorithm is developed, an additional in- 

35 structlon sequence can be provided on the cryptography 
accelerator. The variations between cryptographic algo- 
rithms can be handled with relative ease. The tech- 
niques and mechanisms of the present invention allow 
for a cryptographic accelerator that has the speed and 

40 processing advantages of a customized piece of hard- 
ware while retaining the flexibility ol a piece of software. 
[0018] Figure 1 is a diagrammatic representation of 
one example of a processing system 100 with a cryp- 
tography accelerator according to various embedments 

45 of the present Invention. As shown In Figure 1, the 
present Invention may be Implemented In a stand-alone 
cryptography accelerator 102 or as part of the system 
1 00. In the described embodiment, the cryptography ac- 
celerator 102 is connected to a bus 104 such as a PCI 

so bus via a standard on-chip PCI interface. The process- 
ing system 100 includes a processing unit 106 and a 
system memory unit 108. The processing unit 106 and 
the system memory unit 108 are coupled to the system 
bus 104 via a bridge and memory controller 110. 

55 [0019] According to various embodiments, the 
processing unit 106 may be the central processing unit 
(CPU) of a system 100. In one example, a LAN interface 
11 4 Is provided to couple the processing system 1 00 to 
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a local area network (LAN) to allow packet receipt and 
transmission. Similarly, a Wide Area Network (WAN) In- 
terface 1 1 2 can also be provided to connect the process- 
ing system to a WAN (not shown) such as the Internet 
The WAN Interface manages In-bound and out-bound 
packets, providing automatic cryptographic processing 
for IP packets. 

[0020] In many implementations, the cryptography 
accelerator 102 is an application specific integrated cir- 
cuit (ASIC) coupled to the processor 106. However, the 
cryptography accelerator 102 can also be a program- 
mable logic device (PLD), field programmable gate ar- 
ray (FPGA), or other device coupled to the processor 
1 06. According to specific embodiments, the cryptogra- 
phy accelerator 102 is implemented either on a card 
connected to the bus 104 or as a standalone chip inte- 
grated in the system 1 00. 

[0021 ] In other embodiments, the cryptography accel- 
erator 1 02 itself Is integrated into the processing core of 
a CPU of system 100, such as that available from Ten- 
silica Corporation of Santa Clara, California or ARC 
Cores of San Jose, California. In another embodiment, 
techniques and mechanisms ot the present invention 
are integrated Into a CPU such as a CPU available from 
Intel Corporation of San Jose, California or AMD Cor- 
poration of Sunnyvale, California. By Implementing 
cryptography accelerator functionality entirely on the 
processor 106, a separate card or chip in the system 
100 is not needed. In still other embodiments, the 
processing system 100 including the cryptography ac- 
celerator 102 is implemented as a system on a chip 
(SOC). The network Interfaces, memory, processing 
core, and cryptography accelerator functionality are pro- 
vided on a single Integrated circuit device. 
[0022] The cryptography accelerator 102 is capable 
of Implementing various network security standards, 
such as Internet Protocol Security (IPSec), Secure 
Sockets LayenTransport Layer Security (SSUTLS), In- 
ternet Key Exchange (IKE) which provide application- 
transparent encryption and authentication services for 
network traffic. 

[0023] Network security standards such as IPsec and 
SSL/TLS provide authentication through the use of hash 
algorithms. Two commonly used hash algorithms are 
MD5 and the Secure Hash algorithm (SHA-1). Other 
hash algorithms such as MD4 and MD2 are also avail- 
able. Hash algorithms are described in Applied Cryptog- 
raphy, Bruce Schneier, John Wiley & Sons, Inc. (ISBN 
0471128457), incorporated by reference in its entirety 
for all purposes. Even though many network security 
standards apply the same hash algorithms, different ap- 
proaches are taken toward applying the hash algorithms 
to the actual authentication computation. 
[0024] Different versions of the same network security 
standards even vary approaches toward applying the 
hash algorithms. In IPsec. several approaches such as 
HMAC-M D5-96 and HMAC-SHA1-96 based on the 
hash message authentication code (HMAC) algorithm 
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are provided. The approaches HMAC-M D5-96 and 
HMAC-SHA1 -96 are described in RFC 2403 and RFC 
2404 respectively, while the HMAC algorithm is de- 
scribed in RFC 21 04, the entireties of which are incor- 

5 porated by reference for all purposes. SSUTLS use sim- 
ilar, but slightly different approaches. In SSLv3, an ear- 
lier version of HMAC Is used. In TLS 1 .0, the same ver- 
sion of H MAC is used as in IPsec, but a different number 
of bits are taken for the full result. 

io [0025] The TLS 1 .0 protocol is described in RFC 
2246. the entirety of which is incorporated by reference 
for all purposes. SSL is described in E. Rescoria, SSL 
and TLS: Designing and Building Secure Systems (Ad- 
dJson-Wesley, 2001) and S.A. Thomas, SSL & TLS Es- 

15 sentials: Securing the Web (John Wiley & Sons, Inc. 
2000), the entireties of which are incorporated by refer- 
ence for aB purposes. In addition, SSL/TLS define a set 
of functions using a combination of HMAC, MD5, and 
SHA1 to generate processed data. For example, conv 

20 binations are used to generate a master secret se- 
quence from a p remaster secret sequence, to generate 
key blocks from a master secret sequence, or to perform 
hash operations for finished message processing and 
client certificate verification. 

25 Typical cryptography accelerators use hash cores for 
performing hash operations. When a client or server 
participates in an authentication sequence such as a 
key exchange, clients and servers need cryptography 
accelerators specifically configured for particular ver- 

30 sions of specified network security standards, I n one ex- 
ample, if the server needs to perform TLS 1 .0 opera- 
tions, a cryptography accelerator such as an ASIC spe- 
cifically microcoded with a TLS 1.0 Instruction set would 
be required. A state machine can be used to perform 

35 operations associated with each network security stand- 
ard version. As noted above, however, a state machine 
that can handle the number of standards in existence 
would be extremely complicated and difficult to imple- 
ment. 

[0026] Consequently, many cryptography accelera- 
tors typically contain onry functionality for performing ba- 
sic hash operations such as MD5 or SHA1 operations. 
Authentication specific functionality on a cryptography 
accelerator is often limited to MD5 or SHA1 hash cores. 

*5 The external processor such as an external CPU would 
pass data to a cryptography accelerator when MD5 or 
SHA1 processing was needed. In one example, if a net- 
work security standard specified repeated calls to a 
MD5 or SHA1 function, the external processor would 

50 pass data to the cryptography accelerator during each 
function call, receive data output by the cryptography 
accelerator, and after data as needed before passing the 
data back to the cryptography accelerator for another 
function call. 

55 [0027] Typically, only a single hash function call would 
be performed on data before sending the data back to 
an external processor. In another example, if XOR op- 
erations were specified for data output from the MD5 



4 

BNSCOCIO: <EP 1351422A1J_> 

PAGE 8/65 * RCVD AT 9/25/2005 4:42:55 AM [Eastern Daylight Time] * SVR:USPTO-EFXRF-6/24 ' DNIS: 2738300 • CSID: 66 1^60-1 986 



- DURATION (mm-ss):40-02 



9/25/2005 2:43 AM FROM: 661-460-1986 Huffman Patent Group, LLC TO: 1-571-273-8300 PAGE: 009 OF 065 



7 epi a 

and SHA1 cores, the external processor would perform 
the XOR operations even if the XOR operations were a 
specific part of the cryptographic processing. Because 
of inefficiencies such as the passing of data between 
the external processor and the cryptography accelerator 
between function calls, cryptographic processing for a 
server or client expecting many different versions of net- 
work security protocols has been limited. 
[0028] The techniques of the present invention, how- 
ever, provide not only for a cryptography accelerator 
specifically configured for a particular type of hash op- 
erations without the need to send and receive data to 
an external processor between various calls to a partic- 
ular function implemented on a chip, the techniques of 
the present Invention provide an automatically config- 
urable cryptographic accelerator that recognizes char- 
acteristics ot the input data and automatically performs 
cryptographic processing such as SSLv3 orTLS 1 .0 key 
derivation. 

[0029] It should be noted that recognizing character- 
istics of the input data can include operations such as 
analyzing the input data, retrieving Information associ- 
ated with the input data, or recognizing characteristics 
of instruction sequences associated with the input data. 
A single cryptographic accelerator, for example, with an 
MD5 and a SHA1 core can perform cryptographic 
processing associated with a variety of operations using 
the MD5 and SHA1 hash operations. In one example, 
the cryptography accelerator can perform cryptographic 
operations associated with IPsec and SSL/TLS 
processing. 

[0030] Figure 2A is a diagrammatic representation of 
one example of a cryptography accelerator according to 
various embodiments. The cryptography accelerator in- 
cludes an interface having a parser 203 coupled to an 
entity such as external processor for receiving and de- 
lineating input data sequences. In one example, the 
parser 203 receives a data sequence associated with 
SSLv3 key derivation. The control logic 233 determines 
that key derivation operations associated with SSLv3 
should be performed on the data sequence. The control 
logic 233 retrieves an instruction set associated with 
SSLv3 key derivation from persistent memory 205. 
Memory that retains data after hash operations are com- 
pleted Is referred to herein as persistent memory. Per- 
sistent memory also typically remains Intact when power 
is disconnected. In one embodiment, persistent memory 
is a read-only memory (ROM) on a cryptography accel- 
erator chip, although persistent memory can also be 
components such as flash memory. In another embod- 
iment, persistent memory 205 and temporary memory 
221 are contained In the earne component. A compo- 
nent such as a random access memory (RAM) can be 
loaded with Instruction sets and can provide the capa- 
biDty to function as both a persistent memory and as a 
temporary memory, although such access may be slow. 
[0031] According to various embodiments, persistent 
memory 20S includes a table with various types of op- 
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erations and security protocols identified in the entries. 
The entries correspond to instruction sets for configur- 
ing the cryptography accelerator. Logic and mecha- 
nisms forconfiguring a cryptography accelerator for per- 

5 forming a particular type of cryptographic operation 
such as key derivation or finished message processing 
Is referred to herein as an instruction set. The fetch en- 
gine 207 retrieves the instruction set from persistent 
memory 205. According to various embodiments, the 

10 decoder 209 receives and Interprets the instruction set 
for control logic 233. In one embodiment, control logic 
233 retrieves microcode for performing cryptographic 
operations on an input data sequence. Logic and mech- 
anisms for configuring or managing components such 

15 as hash cores for authentication processing is herein re- 
ferred to as control logic. In one example, control logic 
manages cryptographic processing in components such 
as hash core 223, hash core 225, and temporary mem- 
ory 221. 

20 [0032] In one embodiment, hash cores 223 and 225 
as well as temporary memory 221 also receive Input da- 
ta from parser 203. After a round of processing in hash 
core 223 or hash core 225, data can be provided to tem- 
porary memory 221. Input data that has undergone one 
25 ' or more rounds of hash operations Is referred to herein 
as intermediate data. Temporary memory 221 can store 
the intermediate data and subsequently provide the in- 
termediate data for additional rounds of hash process- 
ing through output port 281 to the input ports 273 and 
30 275 associated with h ash cores 223 and 225. According 
to various embodiments, hash cores 223 and 225 both 
are capable of performing either MD5 or SHA-1 
processing. After the specified number of rounds of 
hash processing have occurred as determined by the 
35 control logic 233, hash cores 223 and 225 can provide 
the final or processed date through output port 283 and 
285 to merger component 241 . Merger component 241 
can then send the processed data to the external entity. 
[0033] According to various embodiments, compo- 
40 nents for performing other operation such as XOR op- 
erations are also included in the cryptography acceler- 
ator. In one example, the XOR component Is coupled to 
the output ports 283 and 285 so that SHA-1 and MD5 
processed data can be combined together. It should be 
*5 noted that the cryptography accelerator can Include a 
number of other components Including cryptography 
blocks such as DES, triple DES, and RC4 cores. The 
cryptography accelerator can Include encryption func- 
tionality, central processing cores, bypass circuitry, etc. 
so [0034] Figure 2B Is a diagrammatic representation 
providing one example on an Instruction sequence Is 
provided to control logic 233. In one example, a parser 
loads a program counter with a pointer from a vector 
pointer table 211. Each pointer 213, 215, 217, and 219 
55 may be configured to refer to an Instruction sequence 
251 , 253, 255. and 257. According to various embodi- 
ments, each instruction sequence is a sequence of 
loads, stores, moves, sets, etc., for performing crypto - 
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graphic operations. In one example, the fetch engine 
gets the sequence of instructions from persistent mem- 
ory as long as there is room in an instruction queue. The 
Instructions are decoded in order to determine resource 
dependencies to allow instructions to be executed out 
of order. According to various embodiments, several 
hash engines are provided in a cryptographic accelera- 
tor and instructions are performed as resources become 
available. Consequently, mechanisms are provided to 
track the resource dependencies. In some examples, 
resources include memory ports, hash engine ports, 
and counters. 

[0035] Figure 2C is a diagrammatic representation of 
one example of a mechanism for tracking resource de- 
pendencies. According to various embodiments, global 
resource vector 240 indicates which resources are be- 
ing used, in one example, resource 242 represents a 
memory Input port being used and resource 244 repre- 
sents a hash engine input port that Is in use. Depend- 
ency vector 260 shows which resources are needed for 
a particular instruction in an instruction sequence. I none 
example, resource 264 and 266 represent the hash en- 
gine input port and the hash engine output port are 
needed for the Instruction to execute. Consequently, the 
instruction may not execute until the global resource 
vector returns to a state shown in vector 280, when re- 
source 284 and 286 representing the hash engine input 
and output ports become available. 
[0036] Figure 3 shows one example of a cryptograph- 
ic handshake sequence between a client 301 and a 
server 303. A wide variety of cryptographic handshake 
sequences associated with key exchanges are availa- 
ble. Figure 3 is merely one example of a handshake. At 
311 , the client 301 transmits a message with a security 
enable parameter to a server 303. The authentication 
message contains an Identifier such as a user name or 
an authentication identifier that allows the receiver to se- 
lect an authentication mechanism out of a possible set 
of mechanisms. According to various embodiments, 
server 303 already has Information associated with the 
client. The server 303 identifies the security enable pa- 
rameter along with any client proposed algorithms and 
transmits an acknowledgement at 3T5 to client 301 in- 
dicating the selection of an algorithm. 
[0037] As noted above, a client 301 transmits a user 
name to a server 303 and a server 303 at 315 transmits 
a value such as a salt associated with the user name 
back to the client 301 . According to other embodiments, 
protocol version, session ID, cipher suite, and compres- 
sion method are exchanged along with a client random 
value and a server random value. 
[0038] At 317, client 301 computes the combined 
hash using the salt and the actual password associated 
with the user name. According to various embodiments, 
the client 301 then provides public information at 321 to 
server 303. Similarly, server 303 at 325 provides public 
information to client 301. Information that would not 
compromise security between a client and a server if 



accessed by a third party is referred to herein as public 
information. At 327, both client 301 and server 303 can 
derive a common value such as a common symmetric 
key using values available to each of them. Many tech- 
s' niquesfor key derivation are available. According to var- 
ious embodiments, a cryptographic accelerator with 
hash cores according to various embodiments are ca- 
pable or deriving keys based on selected algorithms in 
a highly efficient manner. 
10 [0039] For example, client 301 generates a common 
key using public information from server 303, its own 
private information used to generate public information 
provided to server 303, and the combined hash calcu- 
lated by operating on the password appended to a salt. 
is Similarly, server 303 generates a symmetric key by us- 
ing public information from client 301 , a verifier derived 
from the hash of the combined salt and password, and 
private information used to generate public Information 
provided to client 301 . If the password used to derive 
20 the verifier at server 303 Is the same as the password 
used to generate the combined hash value at client 301 , 
the symmetric keys derived at client 301 and server 303 
will be the same. 

[0040] According to various embodiments, the ses- 
25 ©ion key can be used for communications between client 
301 and server 303. It should bo noted that a variety of 
different cryptographic handshake sequences and com- 
munication sequences in general can use the tech- 
niques of the present invention. For example, a session 
3Q key can further be hashed to derive a possibly stronger 
session key. 

[0041] At 331, client 301 sends a hash of the session 
key combined with other public information to server 
303. The server 303 then performs a hash of the derived 

35 session key combined with the other information known 
to server 303 to verify the identity of the client 301 . Sim- 
ilarly, at 335, server 303 sends a hash of the session 
key along with other information known to client 301 to 
allow client 301 to verify the identify of server 303. Ac- 

40 cording to various embodiments, a cryptography accel- 
erator with hash cores according to the techniques of 
- the present invention makes generation of finished mes- 
sages highly efficient. 

[0042] It should be noted that in the above implemen- 
ts tatlon, a password is never transmitted ovar the net- 
work. Instead, both network entities use derivatives of 
the password to generate the session key and other 
cryptographic information used for secure transmission. 
Both the password and the session key need not ever 
so be transmitted over the network. 

[0043] Accordmg to various embodiments, a cryptog- 
raphy accelerator speeds operations such as key deri- 
vation and finished message generation on both the 
server and the client side, it is contemplated that a cryp- 
55 tography accelerator can be used in any network entity, 
it should be noted that the cryptographic handshake se- 
quence shown in Figure 3 is only one example of a se- 
quence that can use the mechanisms and techniques 
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of the present invention. 

[0044] Figure 4 is a diagrammatic representation 
showing data processing according to various embodi- 
ments. In one embodiment, a premaster secret 401 is 
associated with input data provided to a cryptography 
accelerator The cryptography accelerator Is used to ap- 
ply a pseudo-random function (PRF)411 to the premas- 
ter secret 401 to derive a master secret 403. A function 
that takes one or more Inputs and derives an Indetermi- 
nate output Is referred to herein as a pseudo-random 
function. A master secret 403 corresponds to interme- 
diatadata. Another pseudo- random function 41 3 can be 
applied to the master secret 403 to derive final or proc- 
essed data such as authentication keys 405, cryptogra- 
phy keys 407, or initialization sectors 409. The final data 
generated varies depending on the protocol, protocol 
version , and type of processing requested. 
[0045] Figure 5 Is a flow process diagram showing 
one example of a technique for configuring components 
such as hash cores in a cryptography accelerator. At 
501 . input data is received from a component such as 
parser. According to various embodiments, a parser or- 
ganizes the data into a form readable by a hash core. 
At 503, characteristics associated with the input data are 
determined. Information associated with how to process 
input data is referred to herein as characteristics of input 
data. Input data can include Information such as proto- 
col version, session ID, cipher suite, and compression 
method. In one example it is determined what algorithm 
is being applied to the Input data. Algorithms can include 
versions of TLS, SSL, and IKE as well as other protocols 
and variants to the protocols. 

[0046] Determining characteristics can also include 
determining what kind of operation is to be applied to 
the data. For example, a key may need to be derived 
from the data or finished message processing may need 
to be performed. At 505, an instruction set is retrieved 
from persistent memory based on the characteristics as- 
sociated with the input data. Persistent memory may in- 
clude multiple instruction sets for configuring processing 
of input data In a variety of manners. At 507, a hash core 
is configured based on the instruction set. It should be 
noted that other components such as XOR processing 
components and temporary memory may also be con- 
figured at this point. 

[0047] Configuring the components may include load- 
ing microcode associated with the instruction set Into 
control logic associated with the various hash cores and 
configurable components. Alternatively, instructions 
such as microcode can be loaded into a single control 
logic component associated with the various compo- 
nents. At 509, input data is processed using the instruc- 
tion set. After a round of processing, input data becomes 
intermediate data. At 513, Intermediate data is main- 
tained in temporary memory during processing. Using 
temporary memory, data can be manipulated, padded, 
truncated, etc. At 515, input data and intermediate data 
finally become final or processed data after completion 



>1 432A1 12 

ofprocessingat515. The final or processed data is pro- 
vided back to a merger component for forwarding to an 
external entity such as an external processor. 
[0048] Figure 6 is a process flow diagram showing op- 

5 erationsfor performingTLS 1 .0 key derivation according 
to various embodiments. TLS 1 .0 key derivation can be 
used during a cryptography handshake sequence such 
as that shown in Figure 3. A cryptography accelerator 
such as that shown in Figure 2 having an MD5 core and 

to a SHA-1 core can be used for key derivation. According 
to various embodiments, the inputs to the key derivation 
operations are a premaster secret, client random infor- 
mation, and server random information. At 601 , client 
and server random information is saved. Client/server 

15 random information can be saved in a component such 
as temporary memory. 

[0049] At 603, the length of the premaster secret Is 
acquired. At 605, the premaster secret is saved. At 607, 
a prehash operation is performed on the premaster se- 
20 cret using the MD5 and a SHA-1 hash cores. At 611 , it 
is determined if the current session is a new session. If 
the current session Is a new session, a 48-byte p_MD5 
is generated at 613, a 60-byte p_SHA-1 is generated at 
615, and the resulting p__MD5 and p_SHA-1 are com- 
25 bined with an XOR operation to acquire the master se- 
cret key. The 48-byte master secret key Is saved at 61 9 . 
If it Is determined at 611 that the current session is not 
a new session, the premaster secret to master secret 
generation is skipped. 
30 [0050] At 621 , a prehash Is performed on the master 
secret. At 623, the number of bytes needed for the MD5 
and SHA-1 operations is determined. The number of 
bytes needed can be determined by control logic, as the 
number of bytes needed may be one of the character- 
as istics of the input data stream. Based on the number of 
bytes needed, MD5 and SHA-1 operations are per- 
formed at 625 and 627 using the MD5 and SHA-1 cores 
as configured by the control logic. The result Is com- 
bined with an XOR at 629 . According to various embod- 
40 iments, the operations such as sending data to an XOR 
component are determined based on an instruction set 
selected by the control logic, A persistent memory al- 
lows storage of Instruction sets for a variety of opera- 
tions. 

43 foosi] It is determined at 631 if authentication is MD5 
or SHA-1. If authentication Is MD5 at 631 , MD5 inner 
and outer hash contexts are generated at 633. Other- 
wise, SHA-1 inner and outer hash contexts are gener- 
ated at 635. It is determined at 641 whether exportable 

50 data is needed. In one example, export restrictions may 
limit the length of the key. if exportable data is needed, 
an exportable final write key Is generated at 643. if ex- 
portable data is not needed, the process is completed. 
It Is also determined at 651 whether the key Is needed 

55 for a block cipher or a stream cipher. If the key is needed 
for a block cipher, an Initialization vector that is export- 
able is generated at 653. Otherwise, the operations are 
complete for TLS 1.0 key derivation. It should be noted 
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that when the operations are complete, data In tempo- 
rary memory may be removed when the processed data 
Is passed back to an external source. However, instruc- 
tions sets for configuring the cryptography accelerator 
can remain in persistent memory. 
[0052] Figure- 7 is a process flow diagram showing 
SSLv3 key derivation according to various embodi- 
ments. At 701, client and server random information is 
saved. At 703, it is determined whether the current ses- 
sion is a new session. If the current session Is a new 
session, the master secret is generated by saving the 
premaster secret at 705, computing an inner hash using 
a SHA-1 component at 707, and computing an outer 
hash using an MD5 component at 71 1 . A 1 8-byte master 
secret is saved at 713. The inner hash and outer hash 
computations are repeated three times at 715. 
[O053] If the current session is not a new session, the 
number of loops needed Is determined at 717. Control 
logic can determine the number of loops needed. Trie 
master secret is generated by computing an inner hash 
using a SHA-1 component at 71 9 and computingan out- 
er hash using an MD5 component at 721. The master 
6ecret is saved at 723. The inner hash and outer hash 
computations are repeated based on the number of 
loops needed at 725. 

[0054] It is determined at 731 if authentication is MD5 
or SHA-1 . If authentication Is MD5 at 731 , MDS Inner 
and outer hash contexts are generated at 733. Other- 
wise. SHA-1 inner and outer hash contexts are gener- 
ated at 735. It is determined at 741 whether exportable 
data is needed. If exportable data is needed, an export- 
able final write key is generated at 743. If exportable 
data is not needed, the process Is completed. It is also 
determined at 751 whether the key isneeded for a block 
cipher or a stream cipher. If the key is needed for a block 
cipher, an initialization vector that Is exportable Is gen- 
erated at 753. Otherwise, the operations are complete 
for SSLv3 key derivation. 

[0055] Figure 8 is a flow processed diagram showing 
TlS 1 .0 finished message generation, according to var- 
ious embodiments. Finished message or verification 
messages are used to confirm that two network entities 
were successful In key exchange and authentication 
processes. The finished message is typically the first 
message associated with the recently negotiated algo- 
rithms, keys, and secret information. Network entities 
that receive finished messages verify that the contents 
are correct. 

[0056] in one example, once a client has generated 
and sent its own finished message to a server and has 
received and validated a finished message from the 
server, the client can begin to send and receive appli- 
cation related data to the server. To generate a finished 
message according to TLS 1.0, the master secret is 
saved at B01 . At 803, the length of the handshake mes- 
sage used for finished message generation is acquired. 
It Is determined at 305 whether the handshake message 
length is less than 512 bits. If the handshake message 



length Is not less than 512 bits, both SHA-1 and MD5 
hash algorithms are performed on 512 bitblock3 of the 
handshake message. At 811, intermediate states are 
saved. At 813, the last block of the handshake message 
5 is saved. If the handshake message length itself is less 
than 512 bits, the handshake message is simply saved 
at 813. At 815, the intermediate states are loaded. 
[0057] At 817, a final MD5'and SHA-1 hash are per- 
formed. The resulting data is loaded at 81 9 into a pseu- 
10 do-random function. At 821 , a 1 6-byte p_MD5 hash is 
generated and at 823 at 10-byte SHA-1 hash is gener- 
ated. The results are combined with an XOR at 825. The 
client finished message is saved at 827 and concate- 
nated with the last block at 833. At 835, intermediate 
'5 states are loaded and a final MD5 and SHA-1 hash are 
generated forthe server. The resulting data is loaded at 
843 into a pseudo-random function. A 16-byte p_MD5 
hash is generated at 851 and a 20-byte SHA-1 hash is 
generated at 853. The result is combined with an XOR 
20 at 851 . The server finished message is saved at 853. 
[0058] Figure 9 is a flow process diagram showing fin- 
ished message generation for SSLv3. At 901 , the mas- 
ter secret is saved. At 903, the length of the handshake 
message is determined. It is determined at 005 whether 
25 the handshake message length Is less than 512 bits. If 
the handshake message length is not less than 51 2 bits, 
both SHA-1 and MDS hash algorithms are performed on 
51 2 bit blocks of the handshake message. At 911 , inter- 
mediate states are saved. At 913, the last block of the 
3Q handshake message Is saved. If . the handshake mes- 
sage length itself is less than 512 bits, the handshake 
message is simply saved a! 913. At 915, the intermedi- 
ate states are loaded. At 917, a final MDS and SHA-1 
hash ere performed. 
55 [Q059] At 919. the master secret is loaded. An outer 
MDS hash and SHA-1 hash are generated for the client 
at 921 . The client finished messaga is saved at 923 and 
concatenated with the last block at 925. Intermediate 
states are loaded at 931 . An Inner MDS hash and SHA- 
4° 1 hash are generated for the server at 933. The master 
secret Is loaded at 935. At 937, an outer MD5 hash and 
SHA-1 hash are generated for the server. The server 
finished messages are saved at 939. 
[0060] Figures 6-9 are process flow diagrams show- 
*5 ing hash operations that can be performed according to 
various embodiments of the present Invention. The op- 
erations can be performed using components such as 
hash cores, XOR components, and temporary memory 
configured using Instruction sets maintained In perslst- 
50 ent memory. It should be noted that the operations 
shown are specified for particular key derivation and fin- 
ished message generation operations associated with 
TLS 1D and SSLv3. However, the techniques and 
mechanisms of the present invention should not be re- 
55 stricted to these two protocols and the specified ver- 
sions of these two protocols . 

[0061] While the invention has been particularly 
shown and described with reference to specific embod- 
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imenis thereof, it will be understood by those skilled in 
the art that changes in the form and details of the dis- 
closed embodiments may be made without departing 
from the spirit or scope of the invention. It is therefore 
intended that the invention be interpreted to Include all 5 
variations and equivalents that fall within the true spirit 
and scope of the present invention. 
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9. The accelerator of claim 7, wherein control logic 
configures the second hash core using the Instruc- 
tion set. 

10. The accelerator of claim 8, wherein control logic 
manages the second hash core using the instruc- 
tion set. 



EP 1 351 432 A1 



Claims 

1. A cryptography accelerator, the accelerator com- 
prising: 

a first hash core operable to perform a plurality 
of rounds of hash computations on input data 
to derive processed data; 
a persistent memory containing a plurality of in- 
struction sets, the plurality of Instruction sets 
providing Information for the first hash core on 
operations to perform on Input data and inter- 
mediate data during the plurality of rounds of 
hash computations; and 
a temporary memory coupled to the first hash 
core, the temporary memory operable to hold 
input data and intermediate data. 

2. The accelerator of claim 1 , further comprising a sec- 
ond hash core, the second hash core operable to 
perform a plurality of rounds of hash computations 
on input data to derive processed data. 

3. The accelerator of claim 3, wherein the first hash 
core is configurable to operate as either a SHA-1 or 
an MD5 hash core. 

4. The accelerator of any of claims 1-2, wherein the 
second hash core is configurable to operate as ei- 
ther a SHA-1 or an MD5 hash core. 

5. The accelerator of claim 4, wherein the first hash 
core is configured as the inner hash and the second 
hash core is configured as the outer hash for H MAC 
operations. 

6. The accelerator of claim 2, further comprising con- 
trol logic operable to determine characteristics as- 
sociated with the input data and select an instruc- 
tion set based on the input data characteristics. 

7. The accelerator of claim 6, wherein control logic 
configures the first hash core using the instruction 
Set. 

8. The accelerator of claim 6, wherein control logic 
manages the first hash core using the instruction 
set. 



11. The accelerator of claim 9, wherein characteristics 
10 associated with input data comprise random infor- 
mation associated with protocol version, session 
ID, and cipher suite. 

1 2. The accelerator of claim 1 1 , wherein characteristics 
*3 associated with input data further comprise Infor- 
mation associated with a premaster sequence, an 
initialization vector, export Information, and key 
length. 

20 1 3. The accelerator of claim 12, wherein characteristics 
associated with Input data further comprise infor- 
mation associated with how encryption and authen- 
tication will be performed. 

2S 14. The accelerator of any of claims t-14, wherein the 
plurality of instruction sets in persistent memory 
comprise Instructions for performing TLS 1 .0 and 
SSLv3 key derivation and finished message gener- 
ation. 

so 

15. The accelerator of claim 1 4, wherein the persistent 
memory and the temporary memory are provided in 
the same component. 

35 16. A method for performing hash operations, the meth- 
od comprising: 

receiving input data; 

determining characteristics associated withthe 

4Q Input data; 

selecting an instruction set for performing hash 
operations on input data, wherein the instruc- 
tion set ts selected from a plurality of instruction 
sets maintained in persistent memory associ- 

45 ated with a first hash core; and 

configuring the first hash core using the instruc- 
tion set, wherein the first hash core is operable 
to perform hash operations on input data based 
on the instruction set. 

50 

17. The method of claim 16, further comprising; 

configuring a second hash core using the in- 
struction set, wherein the second hash core is 
& operable to perform hash operations on input 

data based on the instruction set. 

1B. The method of claim 17. wherein performing hash 
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operations on the input data comprises performing 
a plurality of rounds of hash computations on input 
data to derive intermediate data and processed da- 
ta. 

5 

1 9. The method of claim 1 8, wherein the first hash core 
is a SHA-1 hash core. 

20. The method of claim 19, wherein the second hash 
core is a MD5 hash core. 10 

21 . The method of claim 1 8, wherein the first hash core 
is configured as the inner hash and the second hash 
core is configured as the outer hash for HMAC op- 
erations. 15 

22. The method of claim 21 , wherein characteristics as- 
sociated with input data comprise random Informa- 
tion associated with protocol version, session ID, 
and cipher suite. so 

23. The method of claim 22, wherein characteristics as- 
sociated with input data further comprise informa- 
tion associated with a premaster sequence, an ini- 
tialization vector, export information, and key 25 
length. 

24. The method of claim 23, wherein characteristics as- 
sociated with Input data further comprise informa- 
tion associated with how encryption and authenti- so 
cation will be performed. 

25. The method of claim 20, wherein the selected In- 
struction set comprises instructions-for performing 
key derivation or finished message generation. ss 

26. An apparatus for performing hash operations, the 
apparatus comprising: 

means for receiving input data; *o 
means for determining characteristics associ- 
ated with the Input data; 
means for selecting an instruction set for per- 
forming hash operations on input data, wherein 
the Instruction set Is selected from a plurality of 
instruction sets maintained in persistent mem- 
ory associated with a first hash core; and 
means for configuring the first hash core using 
the instruction set, wherein the first hash core 
is operable to perform hash operations on input so 
data based on the instruction set. 
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Figure 8 
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